Big Data Methods for Computational Linguistics
نویسندگان
چکیده
Many tasks in computational linguistics traditionally rely on hand-crafted or curated resources like thesauri or word-sense-annotated corpora. The availability of big data, from the Web and other sources, has changed this situation. Harnessing these assets requires scalable methods for data and text analytics. This paper gives an overview on our recent work that utilizes big data methods for enhancing semantics-centric tasks dealing with natural language texts. We demonstrate a virtuous cycle in harvesting knowledge from large data and text collections and leveraging this knowledge in order to improve the annotation and interpretation of language in Web pages and social media. Specifically, we show how to build large dictionaries of names and paraphrases for entities and relations, and how these help to disambiguate entity mentions in texts.
منابع مشابه
A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection
Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....
متن کاملToward Enhanced Metadata Quality of Large-Scale Digital Libraries: Estimating Volume Time Range
Metadata is a special type of data that describes data. In the age of Big Data, the role of metadata has become more prominent–it is obvious that big data needs high-quality metadata description as it becomes less and less possible for humans to go over all the data (if human readable) with the exponential growth of data sets. In this study we try to enhance metadata records (publication dates)...
متن کاملCELEX: Building a Multifunctional, Polytheoretical Lexical Database
Recent developments in Computational Linguistics have brought about an increasing interest in large scale lexical modules, at a time when current trends in hardware and software engineering bring this goal within reach. This paper describes one such system, the C E L E X database. For expository purposes only, this system is contrasted with another big project that starts from different premiss...
متن کاملAutomatic Extraction of Causal Relations from Natural Language Texts: A Comprehensive Survey
Automatic extraction of cause-effect relationships from natural language texts is a challenging open problem in Artificial Intelligence. Most of the early attempts at its solution used manually constructed linguistic and syntactic rules on small and domain-specific data sets. However, with the advent of big data, the availability of affordable computing power and the recent popularization of ma...
متن کاملDoes a Computational Linguist have to be a Linguist?
Early computational linguists supplied much of theoretical basis that the ALPAC report said was needed for research on the practical problem of machine translation. The result of their efforts turned out to be more fundamental in that it provided a general theoretical basis for the study of language use as a process, giving rise eventually to constraint-based grammatical formalisms for syntax, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Data Eng. Bull.
دوره 35 شماره
صفحات -
تاریخ انتشار 2012